Research
Hardware, i.e., CPU’s, storage and memory technology and others, evolves at a rapid pace. Understanding in detail the characteristics of new hardware, like the read/write performance characteristics of new storage technology, is key to adapting as well as optimizing data analysis algorithms.
In this line of work we consequently develop and optimize algorithms for enabling the efficient and scalable analysis of large amounts of data on novel hardware. We focus in particular on new ideas around storage, e.g., cold storage devices or shingles magnetic recording disks (SMR’s) for archiving data and its occasional analysis, and computing, e.g., neuromorphic hardware as a scalable & energy efficient analytics platform.
Applications
Scientists across different disciplines produce vast amounts of data through experimentation and simulation. While the amounts of data produced are already so big that they can barely be managed, the problem is certain to become worse as more and more data is generated and collected. A lot of our research is therefore driven by the needs of scientists in general and neuroscientists in particular.
We address the problems of neuroscientists on their quest to understand and simulate the rat brain. More specifically, we work with neuroscientists in the Human Brain Project (http://humanbrainproject.eu) to manage the vast amounts of data they use and produce. Their research, modeling and simulating a fraction of the rat brain, produces terabytes of data. Current solutions are inadequate to manage this data volume and we are thus investigating new methods to index and store it in order to provide efficient and scalable access. A particular problem we are currently addressing is the retrieval of objects in space, i.e., accessing neurons based on their position. While it is simple to index several thousand neurons, the neuroscientists have to do it for several millions or even billions of neurons. We are developing new spatial indexes to solve this problem.
Improving mobility and decreasing congestion are some of the biggest challenges facing cities today. Congestion impacts the daily lives of commuters, as well as businesses and visitors to any city. Sensors, the Internet of Things (IoT), GPS data and other sources of data provide city planners with a wealth of data. The data contains important hints to develop smart transport solutions that reduce congestion as well as to optimise the use of city public transport. Extracting the information and hints in this deluge of data, however, is a challenge due to the size as well as the number of heterogeneous sources.
We work with transport authorities to address these issues. More precisely, we develop the infrastructure to integrate and analyse heterogeneous data sources (data with a spatial aspect, e.g., sensors, GPS, maps, weather radar and others) to enable spatial analytics on it. Spatial analytics is used for applications like city planning and to optimise the use of limited road space (even in real time).
Demos & Visuals
We turn as much of our technology as possible – research projects and student projects – into cool demos! Check out our super whizz bang videos of some of our applications below.
This video shows our ground-breaking approach to visualise and analyse large scale scientific models. Using an HTC Vive headset as well as haptic gloves, users can immerse into a detailed model of the brain and analyse in virtual reality. Walking in the model to look at different parts of the brain, they can use gestures to pan, zoom and select further subsets to study the model in great detail. The video shows a first demo of the visualisation which can be extended to visualise other models and can also support sophisticated analyses.This particular video shows the message propagation mode: after selecting a subset of the model, messages will be injected into the branches crossing the subset of the model. The messages travel along the branches of the neutrons and leap over between them. The visualisation helps to understand the connectivity of the brain model.
Sensors are becoming ever more pervasive and more powerful, meaning that we can perform increasingly complex tasks on them. This video shows our demo of a wearable device which collects data and classifies it using a neural network on the wearable device itself. The neural network is optimised for size (thus accuracy of classification) as well as energy efficiency. This particular application/demo shows the classification of physical exercises (e.g., push ups) on a mobile phone. The sensor on the body uses an inertial measurement unit to collect acceleration data. The data is classified on the wearable device using a neural network and is sent to a mobile phone using Bluetooth. The phone gives the user feedback on the quality and quantity of different exercises done.
Publication Highlights
Team & Jobs
Teaching
STUDENT TESTIMONIALS
Contact Us
Ph.D. Funding Opportunities
We are always looking for driven and talented Ph.D. applicants interested in developing novel data management techniques deployed and used across different disciplines. We are particularly looking for students with a strong background in data management and ideally also with a background in a different field (life sciences, natural sciences etc.)
Most funding opportunities are for European Union students, but there are several opportunities for overseas applicants as well:
- for all students: Imperial College PhD Scholarships
- for EU students: EPSRC Doctoral Training Account Studentships, Department of Computing EU/International PhD Scholarship and Department of Computing Doctoral Teaching Scholarship
- for Chinese students: CSC Imperial Scholarships
- for Taiwanese students: Top University Strategic Alliance
- for Indian students: Imperial College India Foundation PhD Scholarships
- for Singaporean students: Joint Programme with National University of Singapore
You can find more information about the PhD program in the Department of Computing at Imperial College London, including funding opportunities here.
2017 |
eTRIKS Analytical Environment: A Modular High Performance Framework for Medical Data Analysis Conference Proceedings of the IEEE Big Data Conference, Boston, MA, USA, December 10-14, 2017, 2017. |
Provenance Storage Book Chapter Encyclopedia of Database Systems, Encyclopedia of Database Systems, Springer, 2017. |
Efficient Mining of Regional Movement Patterns in Semantic Trajectories Journal Article PVLDB, 10 (1), 2017. |
Neuromorphic Hardware As Database Co-Processors Conference CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017. |
BLOCK: Efficient Execution of Spatial Range Queries in Main-Memory Conference Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, 2017. |
ADvaNCE - Efficient and Scalable Approximate Density-Based Clustering Based on Hashing Journal Article Informatica, 28 (1), pp. 105–130, 2017, ISBN: 1822-8844. |
STATS - A Point Access Method for Multidimensional Clusters Inproceedings Database and Expert Systems Applications - 28th International Conference, DEXA 2017, Lyon, France, August 28-31, 2017, Proceedings, Part I, pp. 352–361, 2017. |
Data Infrastructure for Medical Research Book 2017, ISSN: 1931-7883. |
2016 |
Hashing-Based Approximate DBSCAN Conference Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Prague, Czech Republic, August 28-31, 2016, Proceedings, 2016. |
TRANSFORMERS: Robust Spatial Joins on Non-uniform Data Distributions Conference 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016, 2016. |
Space Odyssey: Efficient Exploration of Scientific Data Conference Proceedings of the Third International Workshop on Exploratory Search in Databases and the Web, San Francisco, CA, USA, July 1, 2016, 2016. |
An Efficient Parallel Load-Balancing Framework for Orthogonal Decomposition of Geometrical Data Conference High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, 2016. |
Hashing-Based Approximate DBSCAN Conference Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Prague, Czech Republic, August 28-31, 2016, Proceedings, 2016. |
2015 |
On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences Journal Article SIGMOD Record, 44 (2), pp. 23–28, 2015. |
Configuring Spatial Grids for Efficient Main Memory Joins Conference Data Science - 30th British International Conference on Databases, BICOD 2015, Edinburgh, UK, July 6-8, 2015, Proceedings, 2015. |
Towards the Identification of Disease Signatures Conference Brain Informatics and Health - 8th International Conference, BIH 2015, London, UK, August 30 - September 2, 2015. Proceedings, 2015. |
Just-In-Time Data Virtualization: Lightweight Data Management with ViDa Conference CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015. |
Reconsolidating Data Structures Conference Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015., 2015. |
THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads Conference Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015. |
RUBIK: Efficient Threshold Queries on Massive Time Series Conference Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM '15, La Jolla, CA, USA, June 29 - July 1, 2015, 2015. |
2014 |
Data analysis: Approximation aids handling of big data Journal Article Nature, 515 (7526), pp. 198, 2014. |
Spatial Data Management Challenges in the Simulation Sciences Conference Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014., 2014. |
OCTOPUS: Efficient query execution on dynamic mesh datasets Conference IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, 2014. |
2013 |
Enabling Scientific Discovery Via Innovative Spatial Data Management Journal Article IEEE Data Eng. Bull., 36 (4), pp. 3–10, 2013. |
Computational Neuroscience Breakthroughs through Innovative Data Management Conference Advances in Databases and Information Systems - 17th East European Conference, ADBIS 2013, Genoa, Italy, September 1-4, 2013. Proceedings, 2013. |
Accelerating Spatial Range Queries Conference Joint 2013 EDBT/ICDT Conferences, EDBT '13 Proceedings, Genoa, Italy, March 18-22, 2013, 2013. |
TOUCH: In-memory Spatial Join by Hierarchical Data-oriented Partitioning Conference Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013. |
Data-driven Neuroscience: Enabling Breakthroughs via Innovative Data Management Conference Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013. |
GIPSY: Joining Spatial Datasets with Contrasting Density Conference Conference on Scientific and Statistical Database Management, SSDBM '13, Baltimore, MD, USA, July 29 - 31, 2013, 2013. |
2012 |
SCOUT: Prefetching for Latent Feature Following Queries Journal Article PVLDB, 5 (11), pp. 1531–1542, 2012. |
Accelerating Range Queries for Brain Simulations Conference IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 2012. |
SCOUT: Prefetching for Latent Feature Following Queries Journal Article CoRR, abs/1208.0276 , 2012. |
2011 |
Challenges and Opportunities in Self-Managing Scientific Databases Journal Article IEEE Data Eng. Bull., 34 (4), pp. 44–52, 2011. |
2010 |
PARINDA: an Interactive Physical Designer for PostgreSQL Conference EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings, 2010. |
2008 |
2008 International Conference on Autonomic Computing, ICAC 2008, June 2-6, 2008, Chicago, Illinois, USA, 2008. |
Efficient Lineage Tracking for Scientific Workflows Conference Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, 2008. |
2007 |
Autonomic resource provisioning for software business processes Journal Article Information & Software Technology, 49 (1), pp. 65–80, 2007. |
2006 |
JOpera: Autonomic Service Orchestration Journal Article IEEE Data Eng. Bull., 29 (3), pp. 32–39, 2006. |
Developing scientific workflows from heterogeneous services Journal Article SIGMOD Record, 35 (2), pp. 22–28, 2006. |
Mirroring Resources or Mapping Requests: Implementing WS-RF for Grid Workflows Conference Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006, Singapore, 2006. |
2005 |
Publishing Persistent Grid Computations as WS Resources Conference First International Conference on e-Science and Grid Technologies (e-Science 2005), 5-8 December 2005, Melbourne, Australia, 2005. |
Design and Evaluation of an Autonomic Workflow Engine Conference Second International Conference on Autonomic Computing (ICAC 2005), 13-16 June 2005, Seattle, WA, USA, 2005. |
Autonomic Execution of Web Service Compositions Conference 2005 IEEE International Conference on Web Services (ICWS 2005), 11-15 July 2005, Orlando, FL, USA, 2005. |
PostDoc Funding Opportunities
If you are interested in collaborating with our group based on an externally-funded scholarship, for example a Marie-Curie post-doctoral fellowship, as an experienced researcher, please get in contact with us. Several other opportunities for fellowships (including partial ones) are listed below. Please send a message to t.heinis@imperial.ac.uk if you are interested.
Applying for a Ph.D. position
We are looking for aspiring researchers that want to pursue a Ph.D. (in 3 to 3.5 years) in the broad area of scientific data management. The group focuses on scientific data management and high impact interdisciplinary research, i.e., developing ground breaking and novel data management techniques strongly motivated and used in other disciplines (see examples of past research here and demos here. The research interests of a successful applicant have to overlap considerably with the group's interests:
- Big Data, Distributed Indexing & Processing
- Scientific Data Management
- Spatial Data, Spatial Indexing
- Spatio-Temporal Indexing
- High-dimensional Indexing/Clustering
- In-Memory Indexing
To apply you will need to have a strong background in computer science (M.Sc. or B.Sc. in Computer Science or very closely related) and ideally solid experience with data management. Given the interdisciplinary nature of our group's research, the ideal candidate also has a background in a different discipline.
You must have excellent communication skills and prioritise work to meet deadlines. All applicants must be fluent in spoken and written English. Preference will be given to applicants with publications in the relevant areas.
How to apply: please send a message to t.heinis@imperial.ac.uk
Applications must include the following:
- A full CV
- Scan of your transcripts of your studies
- Contact information for 2 references who have agreed to speak about you, your work, and your potential
About Imperial College and London
Imperial College is first class address to pursue excellent, high impact research. Imperial College consistently ranks among the top 5 schools in the world (Times Higher Education & QS rankings). The Department of Computing is also a leading department of Computer Science among UK Universities. It has consistently been awarded the highest research rating (5*) in Research Assessment Exercises (RAE), coming 2nd in the 2008 RAE, and was rated as "Excellent" in the previous national assessment of teaching quality.
Noisy, vibrant and truly multicultural, London is a megalopolis of people, ideas and frenetic energy. The capital and largest city of both the United Kingdom and of England, it is also the largest city in Western Europe and the European Union. Situated on the River Thames, London is an international capital of culture, music, education, fashion, politics, finance and trade which offers ample activities (besides research that is) for every interest, be it culture, sport events, shopping and clubbing.